首页> 外文OA文献 >Reconciliation with non-binary species trees.
【2h】

Reconciliation with non-binary species trees.

机译:与非二元树协调。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Reconciliation extracts information from the topological incongruence between gene and species trees to infer duplications and losses in the history of a gene family. The inferred duplication-loss histories provide valuable information for a broad range of biological applications, including ortholog identification, estimating gene duplication times, and rooting and correcting gene trees. While reconciliation for binary trees is a tractable and well studied problem, there are no algorithms for reconciliation with non-binary species trees. Yet a striking proportion of species trees are non-binary. For example, 64% of branch points in the NCBI taxonomy have three or more children. When applied to non-binary species trees, current algorithms overestimate the number of duplications because they cannot distinguish between duplication and incomplete lineage sorting. We present the first algorithms for reconciling binary gene trees with non-binary species trees under a duplication-loss parsimony model. Our algorithms utilize an efficient mapping from gene to species trees to infer the minimum number of duplications in O(|V(G) | x (k(S) + h(S))) time, where |V(G)| is the number of nodes in the gene tree, h(S) is the height of the species tree and k(S) is the size of its largest polytomy. We present a dynamic programming algorithm which also minimizes the total number of losses. Although this algorithm is exponential in the size of the largest polytomy, it performs well in practice for polytomies with outdegree of 12 or less. We also present a heuristic which estimates the minimal number of losses in polynomial time. In empirical tests, this algorithm finds an optimal loss history 99% of the time. Our algorithms have been implemented in NOTUNG, a robust, production quality, tree-fitting program, which provides a graphical user interface for exploratory analysis and also supports automated, high-throughput analysis of large data sets.
机译:和解从基因树和物种树之间的拓扑不一致中提取信息,以推断基因家族历史中的重复和丢失。推断的重复损失历史为广泛的生物学应用提供了有价值的信息,包括直系同源物鉴定,估计基因重复时间以及生根和校正基因树。虽然对二叉树的对帐是一个棘手且经过充分研究的问题,但没有与非二叉树进行对帐的算法。然而,绝大部分的树是非二元树。例如,NCBI分类法中有64%的分支点有三个或更多子级。当应用于非二叉树时,当前算法会高估重复次数,因为它们无法区分重复和不完整的谱系排序。我们提出了在重复损失简约模型下与非二进制物种树协调的二进制基因树的第一个算法。我们的算法利用从基因到物种树的有效映射来推断O(| V(G)| x(k(S)+ h(S)))时间的最小重复次数,其中| V(G)|是基因树中节点的数量,h(S)是物种树的高度,k(S)是其最大的多面体的大小。我们提出了一种动态编程算法,该算法还可以最大程度地减少损失总数。尽管此算法在最大的多面体的大小上是指数级的,但在实践中对于不超过12或更少的多面体来说效果很好。我们还提出了一种启发式算法,用于估计多项式时间内的最小损失。在经验测试中,此算法可在99%的时间内找到最佳损耗历史记录。我们的算法已在NOTUNG中实现,NOTUNG是一种健壮的,生产质量高的,适合树木的程序,该程序提供了用于探索性分析的图形用户界面,还支持对大型数据集进行自动化的高通量分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号